Arabic/English Word Translation Disambiguation using Parallel Corpora and Matching Schemes

نویسندگان

  • Farag Ahmed
  • Andreas Nürnberger
چکیده

The limited coverage of available Arabic language lexicons causes a serious challenge in Arabic cross language information retrieval. Translation in cross language information retrieval consists of assigning one of the semantic representation terms in the target language to the intended query. Despite the problem of the completeness of the dictionary, we also face the problem of which one of the translations proposed by the dictionary for each query term should be included in the query translations. In this paper, we describe the implementation and evaluation of an Arabic/English word translation disambiguation approach that is based on exploiting a large bilingual corpus and statistical co-occurrence to find the correct sense for the query translations terms. The correct word translations of the given query term are determined based on their cohesion with words in the training corpus and a special similarity score measure. The specific properties of the Arabic language that frequently hinder the correct match are taken into account.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpora based Approach for Arabic/English Word Translation Disambiguation

We are presenting a word sense disambiguation method applied in automatic translation of a query from Arabic into English. The developed machine learning approach is based on statistical models, that can learn from parallel corpora by analysing the relations between the items included in this corpora in order to use them in the word sense disambiguation task. The relations between items in this...

متن کامل

Word Translation Disambiguation without Parallel Texts∗

Word Translation Disambiguation means to select the best translation(s) given a source word in context and a set of target candidates. Two approaches to determining similarity between input and sample context are presented, using n-gram and vector space models with huge annotated monolingual corpora as main knowledge source, rather than relying on large parallel corpora. Experiments on SemEval’...

متن کامل

Exploiting Parallel Corpora for Supervised Word-Sense Disambiguation in English-Hungarian Machine Translation

In this paper we present an experiment to automatically generate annotated training corpora for a supervised word sense disambiguation module operating in an English-Hungarian and a Hungarian-English machine translation system. Training examples for the WSD module are produced by annotating ambiguous lexical items in the source language (words having several possible translations) with their pr...

متن کامل

Word Sense Disambiguation Using Automatically Translated Sense Examples

We present an unsupervised approach to Word Sense Disambiguation (WSD). We automatically acquire English sense examples using an English-Chinese bilingual dictionary, Chinese monolingual corpora and Chinese-English machine translation software. We then train machine learning classifiers on these sense examples and test them on two gold standard English WSD datasets, one for binary and the other...

متن کامل

Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models

Word alignment in bilingual corpora has been an active research topic in the Machine Translation research groups. Corpus is the body of text collections, which are useful for Language Processing (NLP). Parallel text alignment is the identification of the corresponding sentences in the parallel text. Large collections of parallel level are prerequisite for many areas of linguistic research. Para...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008